Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #634 - Implemented data entry date option for TS data retrieval #927

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

zack-rma
Copy link
Collaborator

Fixes #634 - Implements data entry date as option for TimeSeries data retrieval - Serialization bug in progress

@@ -84,7 +83,7 @@

public class TimeSeriesController implements CrudHandler {
private static final Logger logger = Logger.getLogger(TimeSeriesController.class.getName());

private static final String INCLUDE_ENTRY_DATE = "include-entry-date";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a purely pedantic standpoint this should really be a parameter of the content-type, but I may have to accept the reality of this being easier for everyone.

@krowvin, we were just discussing this conceptually.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path parameters are the "what" I am retrieving, query parameters are the "how" am I adding to or filtering that data, and content-type is the "shape" that is returned. This change effects the how and given the way the column names and data array are paired together to already give this flexibility, I don't see a change in shape.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a column is definitely a change in shape. The what is a time series, the query parameters specify exactly which time series, or at least which portion of a given time series (I guess if we're being really pedantic begin and end should be in the fragment... but I digress).

While the flexibility is there, it's flexibility to change the shape. I don't totally disagree with you but given we haven't communicated that portion of the contract very well we are introducing a breaking change. We already have more than one downstream library dependent on these types.

I'm going to type up something on the wiki, or maybe discussion, for the philosophy I'm going for with these, hopefully my argument makes more sense in regards to query vs content type. Especially as it relates to some of the challenges we're currently seeing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is not conveyed very well that one should use the columns array to determine which index to grab from the data array. I don't think that adding information to the swagger docs to communicate that should trigger a content-type change though (maybe version=2.5 but that seems like a huge headache) especially since not including the parameter to retrieve entry date keeps the array intact for backwards compatibility.

Also, I've never seen an API where the content-type changes how much extra (or less) data gets returned to the client. I'd like to see some examples. I also don't see the reason (pedantic or not) for adding begin/end as path parameters as those are filters on the time series. Everything for the identifier of the time series encompasses the time series (which is also why the date version shouldn't be in the path and is a query parameter).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: not arguing for it, but explanation of my logic on the fragement.
a time series is:
/timeseries/Alder Springs.Precip-INC.Total.1Hour.1Hour.Calc?version_date=unversioned identifies a specific time series which is a mathematical series of data - and technically all of it. The entire series could be consider a "document", a fragment is a section within a document; traditionally we would think of lines in a file, but it applies to the time series as well. A file, after all, is just a series of lines.

As I said, extremely pedantic.... admittedly almost to the point of being useless because literally no one does it that way, nor would they even if it could be proven objectively correct.

Technically the units are also representation and not identification but given how limited the use of content-type features are used it would be incredibly difficult to get people to use it; I don't even think the Swagger-UI has a mechanism to slightly tweak the content-type.

But back on the topic of what's correct for us, it seems we're all in agreement that @DanielTOsborne 's initial design is already sufficiently flexible in the current scheme and our failure was in how we documented that for the general end user.

So we leave the inclusion as a query parameter unless a better way actually comes along.

@@ -248,6 +256,11 @@ public static class Record {
@JsonProperty(value = "quality-code", index = 2)
int qualityCode;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be better as a sub class. I know that adds some complex but eventually we may also want to include the version date and text notes that may be attached and that would be a lot of logic in this one class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a subclass. I added a custom deserializer to handle the different classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I said this, Adam said this, but after reading the code and our other discussions above timeseries does it make more sense to just make TimeSeries more generic?

e.g.

A builder where you manually add column names, index, and type, and and functions in the row builder to set such?
something like

withColumn(int index, String name, String description, Class<T> type) {
"logic"
}

... Record:

<T> setColumn(int index, T value, Class<T> type) {
  "logic"
}

Or something like that, it would prevent the need in TimeSeriesDaoImpl have two different loops doing almost 90% the same work. just a check for "I have this column requested, let's also add it."

Basically instead of hard coding the columns at all (okay, maybe time... is a time series) , the user of the given TimeSeries object (after set by builder) can define them at run-time.

Sorry, I know you did a lot here, that just came to me now looking through the current PR.

…dle custom output, adds data entry date support
@zack-rma zack-rma marked this pull request as ready for review October 25, 2024 18:53
@zack-rma zack-rma changed the title Fixes #634 - Implemented data entry data option for TS data retrieval Fixes #634 - Implemented data entry date option for TS data retrieval Oct 31, 2024
@MikeNeilson
Copy link
Contributor

@jbkolze what are your thoughts on this? At least how it's done. Some appears it may be a breaking change which we want to avoid.

@@ -623,7 +622,7 @@ private static TimeSeries buildTimeSeries(ILocationLevelRef levelRef, Interval i
if (qualityCode != null) {
quality = qualityCode.intValue();
}
timeSeries.addValue(dateTime, value, quality);
timeSeries.addValue(dateTime, value, quality, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this isn't a breaking change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check with additional test cases, but this change should not break any existing functionality. A null data entry date parameter is treated as if a standard Time-Value-Quality data entry was provided. The implementation of addValue will use a TimeSeries data record with only three input fields under normal circumstances:

if (dataEntryDate != null) { values.add(new TimeSeriesRecordWithDate(dateTime, value, qualityCode, dataEntryDate)); } else { values.add(new Record(dateTime, value, qualityCode)); }

The existing use cases of TimeSeries should be unaffected, as they will be handled exactly as they were before these changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend subclassing TimeSeries instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would break CWMS.js, The javascript openapi generator already has trouble with our TimeSeries class given some specific assumptions the generator chose to make.

@jbkolze
Copy link

jbkolze commented Nov 4, 2024

@jbkolze what are your thoughts on this? At least how it's done. Some appears it may be a breaking change which we want to avoid.

Conceptually, I don't personally have any qualms with it. You had mentioned in the typing discussion that you all were trying to leave flexibility for dynamically adjusting the time series values array, and this seems like a prime use case for that.

That being said, I am a little concerned about the part you marked as a breaking change. I don't know the CDA source that well, but is this indicating that the response would include a fourth null value even if the include_entry_date is false? Because that definitely would not be ideal -- we've written a lot of CDA code already (and I think other districts have as well) that would have to be updated. Not as big of a deal if an API version were included in the path (.../v3/...), but somewhat cumbersome in the current setup.

My understanding from previous conversations is that you'd get the normal 3-value array if this were set to false, but receive a 4-value array if include_entry_date is true. And the "value-columns" object would be updated to match. That'd be my "ideal" implementation.

@zack-rma
Copy link
Collaborator Author

zack-rma commented Nov 4, 2024

You are correct that setting the include_entry_date parameter to true would result in a four-value array, whereas setting it to false would return a three-value array. Your "ideal" implementation is what I was aiming for to retain backwards compatibility and avoid breaking any of the other endpoints that rely on the TimeSeries implementation.

tsRecord.getValue(qualityNormCol).intValue()
)
);
if (includeEntryDate) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is doubling the time it takes to retrieve time series. Can this replace the retrieve_ts_out_tab calls?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it could replace the retrieve_ts_out_tab call above, doing so would require implementing trim support into the query, as that is currently handled by the retrieve_ts_out_tab call. I haven't quite figured out the best way to do so, so maybe we can discuss this in more detail

@@ -159,7 +164,8 @@ public ZonedDateTime getEnd() {
}

// Use the array shape to optimize data transfer to client
@JsonFormat(shape=JsonFormat.Shape.ARRAY)
@JsonFormat(shape = JsonFormat.Shape.ARRAY)
@JsonDeserialize(contentUsing = TimeSeriesRecordDeserializer.class)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is overwritten for XML using a Mixin, did you verify that behavior works as intended still?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a Mixin test that verifies that the XML tags for the value records have the appropriate labels. There are also a couple serialization/deserialization tests that verify that this works as intended.


@JsonProperty(value = "value-columns")
@Schema(name = "value-columns", accessMode = AccessMode.READ_ONLY)
public List<Column> getValueColumnsJSON() {
return getColumnDescriptor();
return getColumnDescriptor((values != null && !values.isEmpty())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like behavior for a subclass

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to TimeSeries subclass

@@ -218,7 +228,16 @@ private List<Column> getColumnDescriptor() {
columns.add(new TimeSeries.Column(fieldName, fieldIndex + 1, f.getType()));
}
}

if (includeDataEntryDate) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be accomplished better with a subclass

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into subclass


// This class is used to deserialize the time-series data JSON into an object
// Solves the issue of the deserializer getting stuck in a loop
// and throwing a StackOverflowError when trying to handle the Record class directly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems sketchy to me, why is your custom deserializer causing this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed custom deserializer

return jsonParser.getCodec().treeToValue(node, TimeSeriesRecordWithDate.class);
}
String nodeString = node.toString();
if (nodeString.startsWith("[")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mixin doesn't solve the need for this custom parsing? All this logic looks like we're circumventing jackson too much

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed custom serializer

Timestamp dateTime = Timestamp.from(Instant.ofEpochMilli(Long.parseLong(valList[0])));
double value = Double.parseDouble(valList[1]);
int quality = Integer.parseInt(valList[2]);
Timestamp entryDate = Timestamp.from(Instant.ofEpochMilli(Long.parseLong(valList[3])));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to convert from Instant to Timestamp, Timestamp's constructor takes in epoch millis.

@Override
public TimeSeries.Record deserialize(JsonParser jsonParser, DeserializationContext deserializationContext) throws IOException {
JsonNode node = jsonParser.readValueAsTree();
if (node.get("data-entry-date") != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a constant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably also be ignored, data-entry-date is always set by the database itself, an external system/user isn't allowed to change it.

@@ -623,7 +622,7 @@ private static TimeSeries buildTimeSeries(ILocationLevelRef levelRef, Interval i
if (qualityCode != null) {
quality = qualityCode.intValue();
}
timeSeries.addValue(dateTime, value, quality);
timeSeries.addValue(dateTime, value, quality, null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend subclassing TimeSeries instead

@MikeNeilson
Copy link
Contributor

My understanding from previous conversations is that you'd get the normal 3-value array if this were set to false, but receive a 4-value array if include_entry_date is true. And the "value-columns" object would be updated to match. That'd be my "ideal" implementation.

That is correct for the location indicated, which is location levels backed by a time series. I don't know how much it would affect what you're currently using, but it's definitely not ideal. At the least it definitely shouldn't be null, may as well provide the data, but seems like a parameter should be added to match on the location level end point.

But it sounds like your on the same page with Adam about the shape being already explicitly flexible so I'm okay with that section now; not "ideal", but what is, definitely something to better document though.

});
logger.fine(() -> query2.getSQL(ParamType.INLINED));
final TimeSeriesWithDate timeSeries = new TimeSeriesWithDate(timeseries);
query2.forEach(tsRecord -> timeSeries.addValue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to solve this now, but definitely before we add requesting any text notes as well. There has to be a better way to handle this with the builders.

if (pageSize != 0) {
if (versionDate != null) {
whereCond = whereCond.and(AV_TSV_DQU.AV_TSV_DQU.VERSION_DATE.eq(versionDate == null ? null :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logic handle max version? or will AV_TSV_DQU always return every version? or only the specifically requested version?

@@ -248,6 +256,11 @@ public static class Record {
@JsonProperty(value = "quality-code", index = 2)
int qualityCode;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I said this, Adam said this, but after reading the code and our other discussions above timeseries does it make more sense to just make TimeSeries more generic?

e.g.

A builder where you manually add column names, index, and type, and and functions in the row builder to set such?
something like

withColumn(int index, String name, String description, Class<T> type) {
"logic"
}

... Record:

<T> setColumn(int index, T value, Class<T> type) {
  "logic"
}

Or something like that, it would prevent the need in TimeSeriesDaoImpl have two different loops doing almost 90% the same work. just a check for "I have this column requested, let's also add it."

Basically instead of hard coding the columns at all (okay, maybe time... is a time series) , the user of the given TimeSeries object (after set by builder) can define them at run-time.

Sorry, I know you did a lot here, that just came to me now looking through the current PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need Data Entry Date for Time Series Data
4 participants